Practical - Week 4 Pt.2
(Department of Spatial Sciences)
2022-11-13
Species Distribution Modelling using MaxEnt
Entropy is a measure of uncertainty or disorder in a system.
The principle of maximum entropy states that the most appropriate distribution to model a given set of data is the one with highest entropy among all those that satisfy the constraints of our prior knowledge.
It allows you to find the distribution that makes the fewest assumptions about your data (the one with maximal information entropy)
Maxent (Maximum Entropy) is a statistical modeling approach based on the Maximum Entropy principle.
It finds the probability distribution that maximizes entropy, subject to the constraints imposed by the available information (e.g. environmental variables).
It requires presence data (presence-background model) and it is widely used for SDMs.
It provides estimates of relative suitability, not occupancy probability.
Variables that we will use
Base map of our area of study
Getting the number of occurrence records
[1] 1143
Download occurrence data using rgbif
We inspect the downloaded occurrences plotting them in a world map using rnaturalearth, sf and ggplot2
Clean the data using CoordinateCleaner
Plot the data to check if it makes sense after cleaning
Getting WorldClim variables using the package geodata for our area of study (Czechia)
Keeping only one observation per grid cell. We can use spatSample from the terra package or thin from the spThin package.
Plotting the thinned data to check if it makes sense
To generate the background data, we sample random points from our area of study using the sampleRandom function from the raster package.
Extract the climatic data for our presences and background points using extract from the terra package.
Identifying variables that have a correlation of over 70%
Selecting the variables in our data
# Selecting the most relevant and uncorrelated variables
my_preds <- c( "wc2.1_30s_bio_1", # Annual mean temperature
"wc2.1_30s_bio_2", # Mean diurnal range
"wc2.1_30s_bio_4", # Temperature seasonality
"wc2.1_30s_bio_9", # Mean temperature of driest quarter
"wc2.1_30s_bio_17", # Precipitation of driest quarter
"wc2.1_30s_bio_12") #Annual precipitation
# Filtering the relevant columns of the data
vanat_data <- vanat_obs_clim %>%
dplyr::select(any_of(my_preds))
bg_data <- bg_clim %>%
dplyr::select(any_of(my_preds))Renaming the variables
Joining presence and background data. Maxent identifies presences as 1 and background points as 0.
We will fit the Maxent model using the dismo package
Plotting the relative variable contribution.
Which is the variable that contributes the most?
Viewing the results of the model
[,1]
X.Training.samples 550.0000
Regularized.training.gain 0.1169
Unregularized.training.gain 0.1411
Iterations 380.0000
Training.AUC 0.6486
X.Background.points 1334.0000
wc2.1_30s_bio_1.contribution 4.6424
wc2.1_30s_bio_12.contribution 78.8980
wc2.1_30s_bio_17.contribution 4.6743
wc2.1_30s_bio_2.contribution 5.9832
wc2.1_30s_bio_4.contribution 1.5785
wc2.1_30s_bio_9.contribution 4.2235
wc2.1_30s_bio_1.permutation.importance 19.0363
wc2.1_30s_bio_12.permutation.importance 65.6388
wc2.1_30s_bio_17.permutation.importance 0.2608
wc2.1_30s_bio_2.permutation.importance 5.8696
wc2.1_30s_bio_4.permutation.importance 8.8200
wc2.1_30s_bio_9.permutation.importance 0.3746
Entropy 7.0786
Prevalence..average.probability.of.presence.over.background.sites. 0.5521
Fixed.cumulative.value.1.cumulative.threshold 1.0000
Fixed.cumulative.value.1.Cloglog.threshold 0.3309
Fixed.cumulative.value.1.area 0.9730
Fixed.cumulative.value.1.training.omission 0.0091
Fixed.cumulative.value.5.cumulative.threshold 5.0000
Fixed.cumulative.value.5.Cloglog.threshold 0.3789
Fixed.cumulative.value.5.area 0.8928
Fixed.cumulative.value.5.training.omission 0.0382
Fixed.cumulative.value.10.cumulative.threshold 10.0000
Fixed.cumulative.value.10.Cloglog.threshold 0.4252
Fixed.cumulative.value.10.area 0.8066
Fixed.cumulative.value.10.training.omission 0.0800
Minimum.training.presence.cumulative.threshold 0.2935
Minimum.training.presence.Cloglog.threshold 0.2570
Minimum.training.presence.area 0.9895
Minimum.training.presence.training.omission 0.0000
X10.percentile.training.presence.cumulative.threshold 12.6374
X10.percentile.training.presence.Cloglog.threshold 0.4468
X10.percentile.training.presence.area 0.7661
X10.percentile.training.presence.training.omission 0.1000
Equal.training.sensitivity.and.specificity.cumulative.threshold 40.2325
Equal.training.sensitivity.and.specificity.Cloglog.threshold 0.5374
Equal.training.sensitivity.and.specificity.area 0.4033
Equal.training.sensitivity.and.specificity.training.omission 0.4036
Maximum.training.sensitivity.plus.specificity.cumulative.threshold 48.0194
Maximum.training.sensitivity.plus.specificity.Cloglog.threshold 0.5964
Maximum.training.sensitivity.plus.specificity.area 0.3208
Maximum.training.sensitivity.plus.specificity.training.omission 0.4636
Balance.training.omission..predicted.area.and.threshold.value.cumulative.threshold 0.2935
Balance.training.omission..predicted.area.and.threshold.value.Cloglog.threshold 0.2570
Balance.training.omission..predicted.area.and.threshold.value.area 0.9895
Balance.training.omission..predicted.area.and.threshold.value.training.omission 0.0000
Equate.entropy.of.thresholded.and.original.distributions.cumulative.threshold 5.2220
Equate.entropy.of.thresholded.and.original.distributions.Cloglog.threshold 0.3809
Equate.entropy.of.thresholded.and.original.distributions.area 0.8891
Equate.entropy.of.thresholded.and.original.distributions.training.omission 0.0400
Predicting the relative suitability of the territory for our taxa based on our results
We evaluate the models using the evaluate function from the dismo package for the training and testing data.
mod_eval_train <- dismo::evaluate(p = p, a = a, model = mod)
mod_eval_test <- dismo::evaluate(p = p_test, a = a, model = mod)
# Evaluation of the training model
print(mod_eval_train)class : ModelEvaluation
n presences : 550
n absences : 786
AUC : 0.7526671
cor : 0.4630484
max TPR+TNR at : 0.5962506
class : ModelEvaluation
n presences : 236
n absences : 786
AUC : 0.7095247
cor : 0.3704079
max TPR+TNR at : 0.634419
How each environmental variable affects the Maxent prediction. They show how the logistic prediction changes as each environmental variable is varied, keeping all other environmental variables at their average sample value.